Improvements in machine translation for English/iraqi speech translation

نویسندگان

  • Shirin Saleem
  • Krishna Subramanian
  • Rohit Prasad
  • David Stallard
  • Chia-Lin Kao
  • Premkumar Natarajan
  • Raid Suleiman
چکیده

In this paper, we describe techniques for improving machine translation quality in the context of speech-to-speech translation for significantly different language pairs. Specifically, we explore three broad approaches for improving translation from English to Iraqi and vice versa. First, we investigate normalization techniques which address the differences in spoken and written forms of both languages. Second, we incorporate additional knowledge sources into the translation process such as a bilingual lexicon and named entity detection. Third, we exploit the rich morphological structure of Iraqi Arabic using two different approaches. The first approach decomposes words in Iraqi Arabic whereas the second approach, a novel one inflects English by combining key phrases into words using the minimum descriptive length criterion. Significant gains in accuracy are observed, while translating from text as well as speech recognition output.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Colloquial Iraqi ASR for speech translation

In this paper we describe a real-time speech recognition system developed for colloquial Iraqi Arabic. This system is currently used in our speech-to-speech translation system configured for bi-directional communication in English and Iraqi on a laptop. We present experimental results on Iraqi utterances from different speech-to-speech translation domains, and analyze the usefulness of acoustic...

متن کامل

Recent advances in SRI'S IraqCommTM Iraqi Arabic-English speech-to-speech translation system

We summarize recent progress on SRI’s IraqCommTM Iraqi Arabic-English two-way speech-to-speech translation system. In the past year we made substantial developments in our speech recognition and machine translation technology, leading to significant improvements in both accuracy and speed of the IraqComm system. On the 2008 NIST-evaluation dataset our twoway speech-to-text (S2T) system achieved...

متن کامل

Fixed Length Word Suffix for Factored Statistical Machine Translation

Factored Statistical Machine Translation extends the Phrase Based SMT model by allowing each word to be a vector of factors. Experiments have shown effectiveness of many factors, including the Part of Speech tags in improving the grammaticality of the output. However, high quality part of speech taggers are not available in open domain for many languages. In this paper we used fixed length word...

متن کامل

A Wearable Headset Speech-to-Speech Translation System

In this paper we present a wearable, headset integrated eyesand hands-free speech-tospeech (S2S) translation system. The S2S system described here is configured for translingual communication between English and colloquial Iraqi Arabic. It employs an n-gram speech recognition engine, a rudimentary phrase-based translator for translating recognized Iraqi text, and a rudimentary text-tospeech (TT...

متن کامل

Building an English-iraqi Arabic machine translation system for spoken utterances with limited resources

This paper presents an English-Iraqi Arabic speech-to-speech statistical machine translation system using limited resources. In it, we explore the constraints involved, how we endeavored to mitigate such problems as a non-standard orthography and a highly inflected grammar, and discuss leveraging existing plentiful resources for Modern Standard Arabic to assist in this task. These combined tech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007